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1  Project  Goals 

Integrating  data  and  knowledge  from  multiple  heterogeneous  sources,  each  one  pos¬ 
sibly  with  a  different  underlying  data  model,  is  not  only  an  important  aspect  of 
automated  reasoning,  but  also  of  retrieval  systems  where  queries  can  span  multiple 
such  sources.  These  sources  can  be  as  different  as  relational  or  deductive  databases, 
object  bases,  (constraint)  knowledge  bases,  or  even  (structured)  files  and  arbitrary 
program  packages  encapsulating  specific  knowledge,  often  in  a  hard-wired  form  ac¬ 
cessible  only  through  function  calls.  Many  queries  can  only  be  answered  if  data  and 
knowledge  from  these  different  sources  are  available. 

In  1991-92,  Gio  Wiederhold  proposed  the  pioneering  concept  of  a  m.ediator  -  a 
program  that  integrates  multiple  databases.  However,  while  the  goals  of  precisely 
what  objectives  a  mediator  would  satisfy  were  clear,  how  these  objectives  would 
be  accomplished  and  implemented  was  not  clear.  The  principal  goal  of  this  project 
was  to  develop  a  platform,  for  the  creation  of  mediated  application.  Such  a  platform, 
would  provide  a  mechanism,  ivithin  which  mediators  may  be  developed  for  a  variety  of 
application.  The  platform,  itself  would  be  application  independent,  but  would  provide 
a  variety  of  underlying  technology  and  services  that  would  be  critical  to  the  success  of 
any  .specific  application  involving  the  use  of  mediation  technology. 

In  this  project,  we  have  developed  a  formal,  theoretically  solid  framework  for 
the  creation  and  deplo3mient  of  mediators  that,  access  distributed  data  sources,  and 
shown  that  this  mathematically  justified  framework  scales  up  to  large  scale  applica¬ 
tions  involving  integrated  access  not  only  to  multiple  databases,  but  also  to  multiple 
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data  structures,  and  software  packages  located  at  diverse  networked  sources.  The 
resulting  system,  called  WebHERMES.  is  accessible  to  any  user  who  has  access  to 
the  world-wide  web  through  any  standard  Web  browser.  This  includes  access  from 
T.  ni.\  workstations.  PCs'.  M.\Cs.  a.s  well  as  palmtop  computing  devices  such  as  the 
Philips  Velo  or  the  I  S  Robotics  Pilot. 

The  organijcation  ol  this  report  is  as  follows.  Section  II  explains  the  main  sci¬ 
entific  contributions  of  this  project.  Section  III  explains  the  .software  that  has  been 
developed.  Section  IV  specifies  what  Educational  Objectives  have  been  accomplished 
from  this  project.  Section  \  presents  a  list  of  all  publications  acknowledging  this 
contract. 


2  Awards  /  Recogni  t  ion 

The  HERMES  project,  and  its  participants,  have  received  significant  recognition  for 
their  work  on  this  project,  from  a  variety  of  external  sources.  These  are  listed  below: 

•  National  Young  Investigator  Award.  1093  to  V.S.  Subrahmanian  (PI). 
National  Science  Foundation. 

•  Maryland  Distinguished  Young  Scientist  Award.  V.S.  Subrahmanian 
(PI).  Maryland  Science  Center  and  the  Maryland  Academy  of  Sciences,  1997. 

•  Association  for  Computing  Machinery  (ACM)  Washing  Chapter  Samuel 
Alexander  Award.  1997,  Kasim  S.  Candan  (graduate  research  assistant  funded 
by  this  contract),  for  an  outstanding  doctoral  dissertation. 

•  Business  Week  Magazine  highlights  the  accomplishments  of  Sibel  .Adali  who 
received  her  PhD  for  her  work  on  this  project. 

•  Publications:  Over  30  publications  in  top-quality,  archival  scientific  journals, 
and  16  papers  in  leading  scientific  conferences  were  published  due.  in  part,  to 
support  received  under  this  contract. 


3  Scientific  Accomplishments  of  Project 

The  HERMES  (Heterogeneous  Reasoning  and  Mediator  System)  project  was  started 
in  Sep.  1993.  During  this  time,  we  have  developed: 

•  A  language  in  which  mediators  can  be  expressed 

•  A  compiler  within  which  mediators  expressed  in  the  above  language  can  be 
implemented 

•  A  distributed  computation  framework  so  that  the  mediator  compiler  can  access 
data  at  multiple  sites  across  the  network 


•  A  set  of  techniques  to  optimize  queries  to  such  distributed  heterogeneous  repos¬ 
itories 

•  A  set  of  teclinic[ues  to  incrementally  create  materialized  mediated  views  (better 
known  as  data  warehouses)  consisting  of  information  from  multiple  sources 

•  A  set  of  technicjues  to  specify  security  policies  in  mediated  systems,  as  well  as 
process  u]3dates  in  secure  mediators 

•  Web  client  access  to  mediated  applications 

•  A  unified  framework  for  representing  and  manipulating  multimedia  data  located 
across  the  Internet. 

We  will  now  describe  l)riefly,  our  contributions  in  each  of  these  areas. 

3.1  Mediator  Language 

We  have  proposed  the  following  concepts  for  the  HERMES  mediator  language.  A 
domain,  D.  is  an  abstraction  of  databases  and  software  packages  and  consists  of  three 
components!  (1)  a  set.  S.  who.se  elements  may  be  thought  of  as  the  data-objects 
that  are  being  manipulated  by  the  package  in  question.  (2)  a  set  F  of  functions  on 
5  -  these  functions  take  olijects  in  5  as  input,  and  return,  as  output,  objects  from 
their  range  (which  needs  to  be  specified).  The  functions  in  F  may  be  thought  of  as 
the  predefined  functions  that  ha.ve  been  implemented  in  the  software  package  being 
considered,  (3)  a  set  of  relations  on  the  data-objects  in  S  -  intuitively,  these  relations 
may  be  thought  of  as  the  predefined  relations  in  the  domain, 

In  our  svstem,  called  HERMES  (‘‘Heterogeneous  Reasoning  and  Mediator  vSys- 
tem”),  a  domain  call  is  a  syntactic  expression  of  the  form 

domainnajne  :  domainfunction(<  argument  1, ...,  argumentn  >) 

where  domainf unction  is  the  name  ol  the  function,  and  argument  1 ,  .  •  •  , argumentn 
are  arguments  to  that  function.  Intuitively,  a  domain  call  may  be  read  as:  in  the 
domain  called  domainname.  execute  the  function  called  domainfunction  on  the  argu¬ 
ments 

<  argument  1, ...,  argumentn  >  . 

The  result  of  executing  this  domain  call  is  coerced  into  a  set  of  entities  that  have  the 
same  type  as  the  output  type  of  the  function  domainfunction  on  the  arguments 

<  argument!,  ....argumentn  >  . 

A  domain-call  atom  DCA-atom)  is  of  the  form 

in(X,  domainname  :  domainfunction(<  argl, - argn  >)) 

polymorphic  set  membership  predicate.  For  example, 

in{  A.  paradox  :  select^q(' phoii ebook' Fnam.e\  '  josmith  )) 


is  a  DCA-atom  that  is  true  just  in  case  A  is  a  tuple  in  the  result  of  executing  a 
selection  operation  (finding  tuples  where  the  NAME  field  is  JO  SxMITH  on  a  relation 
called  PHONEBOOK  maintained  in  a  PARADOX  database  system. 

A  mediator  is  a  set  of  rules  of  the  form 

A  <—  Dl&r . . . 


where  Al,..,  An  are  atoms,  and  Dl,..,Dm  are  DCA-atoms. 

We  have  studied  the  syntax  and  semantics  of  this  language  exhaustively,  yielding 
a  clean  amalgamation  of  multiple  databases,  data  structures  and  software  packages. 
We  have  developed  algorithms  that  are  provably  correct  that  answer  queries  to  these 
databases  very  efficiently. 


4  Mediator  Compiler 

We  have  built  a  mediator  compiler  within  which  queries  to  HERMES  mediators  may 
be  expressed  and  processed.  There  are  two  important  aspects  to  constructing  a  me¬ 
diator;  domain  integration  and  .semantic  integration.  Intuitively,  domain  integration 
is  the  physical  linking  of  the  data  sources  and  reasoning  systems,  while  semantic  in¬ 
tegration  is  the  coherent  extraction  and  combination  of  the  information  provided  by 
the  data  and  reasoning  sources,  serving  a  given  purpose. 

The  HERMES  compiler  takes  as  input,  a  mediator  expressed  in  the  HERMES 
language  expressed  in  the  preceding  section,  and  produces  as  output,  a  set  of  data 
structures  that  may  be  used  to  process  and  execute  queries  in  the  Hermes  query 
language.  When  a  user  of  an  application  mediator  built  in  HERMES  expresses  a 
query,  the  mediator  rules  defined  with  the  application  mediator  expands  the  query 
into  a  set  of  subqueries.  Such  subqueries  may  be  .subqueries  either  to  the  HERMES 
system  itself,  or  to  external  data  sources  accessed  by  the  HERMES  mediator.  Here 
is  an  example  of  how  a  HERMES  query  is  processed. 

Example  Query:  Let  rtel  be  a  ternary  predicate  such  that  rtel(0,Z?,R)  is  sat¬ 
isfied  iff  is  a  route  from  the  origin  to  an  unspecified  destination  such  that  the 
destination  has  an  air-field  as  well  cis  certain  types  of  ammunition.  For  this,  we  may 
define  the  following  clause  in  the  mediator: 

rtel(0,D,R)  *- 

in(Pl,  paradox  :  select=(f acilities,  facility,  “airfield"))  k 
in(P2,dbase  :  select-( supplies,  item,  “aunmunition"))  & 

=  (Pl.place,  P2.place)&: 

in(D,  spatial :  f  indpt(Pl.place))«fe 

in(R,rp  :  route(0,D)). 

To  obtained  the  result  from  a  given  location  <?.  we  can  pose  the  query 

rtel(^.D,R). 
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This  is  then  processed  as  follows:  PARADOX  is  invoked  which  SELECTS  all  tuples  from 
the  facilities  relation  that  have  the  an  airfield  facility.  PI  is  then  instantiated 
to  one  of  the  selected  tuples.  Next  DBASE  is  then  Ciueried  to  SELECT  all  tuples  from 
the  supplies  relation  that  have  the  item  field  set  to  ammunition.  P2  is  instantiated 
to  one  such  tuple.  A  check  is  made  to  see  that  PI  and  P2  have  the  same  place  field. 
In  other  words,  this  ensures  that  single  place  is  found  with  both  ammunition  and  an 
airfield?  If  not.  the  HERMES  inference  engine  looks  for  other  possible  instantiations 
of  PI  and  P2  that  satisfy  these  constraints.  Einally,  the  xy-location  of  the  place 
PI. place  is  computed  using  the  spatial  domain,  instantiating  D,  and  RP  is  called  to 
find  a  route  from  the  origin  t  to  D. 

The  HERMES  ciuery  processing  algorithm  is  a  sound  and  complete  algorithm  for 
processing  queries  to  heterogeneous  mediated  systems. 

5  Query  Optimization  and  Caching 

An  important  issue  that  we  have  studied  is  ways  to  make  the  processing  of  queries 
in  heterogeneous  reasoning  sj^stems  more  efficient.  We  advocate  the  intelligent  use 
of  high-speed  caches  to  avoid  computations  whenever  possible.  To  accomplish  this 
we  introduce  the  concept  of  an  ‘“invariant",  i.e.  an  expression  about  the  known  in¬ 
put/output  relationships  of  a  program  that  can  be  processed  by  the  mediator.  We 
have  shown  how  such  caches  may  be  maintained,  and  how  the  query  processing  pro¬ 
cedure  can  make  better  use  of  these  caches,  given  the  knowledge  about  different 
packages,  to  reduce  the  complexity  of  query  execution.  Our  methods  are  sound  and 
complete. 

It  is  possible  that  in  the  processing  of  the  rules  in  the  mediator,  “similar”  function 
calls  to  external  programs  will  need  to  be  executed  several  times  since  the  same  kind 
of  information  may  be  requested  over  and  over  by  different  users.  Backtracking  is 
another  reason  for  such  a  situation.  Calling  an  external  program  is  usually  a  costly 
operation  because  of  the  memory,  CPU  requirements  and  possible  network  delays. 
Eurthermore.  actual  packages  may  levy  charges  for  accessing  them.  Suppose  there 
is  a  way  to  guess  “some”  of  the  answers  that  will  be  returned  by  an  external  call. 
If  a  refutation  is  found  by  substituting  one  of  these  answ’ers,  then  there  is  no  need 
to  execute  the  external  domain  call.  This  is  accomplished  by  caching  the  answers 
returned  by  previous  external  calls  and  re-using  them  when  needed.  Similarly,  if 
there  is  a  way  of  knowing  that  a  function  is  not  defined  for  some  inputs,  whenever 
it  is  called  for  these  inputs,  we  can  terminate  the  search  down  a  path  of  the  search 
space. 

The  challenge  of  this  approach  lies  in  deciding  which  sets  of  answers  are  relevant,  and 
in  representing  the  input /output  behavior  of  some  external  functions.  This  informa¬ 
tion  is  stored  in  the  system  with  the  help  of  some  explicit  rules  which  will  be  referred 
to  as  “invariants”.  Invariants  are  expressions  specifying  the  relation  between  the  set 
of  answers  returned  by  an  external  call,  its  arguments  and  other  possible  external 
calls. 
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As  a.n  example,  consider  the  following  invariant  : 

T2  >  T1  =»  f(T2)  3  f(Tl) 

The  above  expression  can  be  read  as  follows:  if  T2  >  Tl.  then  all  the  solutions  of  f  (Tl) 
are  also  solutions  of  f  (T2).  Hence,  if  the  set  of  answers  for  f  (Tl)  were  previously 
stored  in  a  cache,  then  these  answers  can  be  re-used  whenever  the  function  f  (T2)  is 
called;  if  none  of  the  answers  to  f  (Tl)  sati.sfies  the  rest  of  the  query  then  f  (T2)  needs 
to  be  computed.  .An  example  of  such  an  invariant  is  given  below: 

Example  1  Suppose  relation  is  a  constant  in  the  mediator  which  refers  to  a  re¬ 
lational  database  with  the  usual  selection  operators.  For  example.  select<(R,  F,  V) 
selects  all  the  tuples  in  table  R  such  that  the  value  of  the  field  F  is  less  than  or  equal 
to  V.  Then,  the  following  are  possible  invariants  for  different  select  functions. 

T2  <  Tl  =>  relation  :  select<(R,  Field, Tl)  3 
relation  :  select<(R, Field, T2). 

T2  >  Tl  ^  relation  :  select>(R.  Field, T2)  3 
relation:  select>(R, Field, Tl). 

The  first  invariant  can  be  read  as:  For  any  given  database  R  and  field  Field  in  the 
domain  relation,  whenever  T2  >  Tl  is  .satisfied,  all  the  tuples  that  are  in 
relation:  select>(R. Field. T2),  are  also  in  relation:  select>(R, Field, Tl). 

Example  2  Suppose  the  domain  spatial  is  a  spatial  data  structure  such  as  a  point 
quadtree  storing  points  in  two-dimensional  space.  The  function  vertical_slice(File,X,Di8t) 
in  this  domain  returns  all  the  points  that  have  A’-coordinates  between  X+Dist  and 
X-Dist,  in  other  words  all  the  points  that  are  in  the  vertical  slice  taken  from  X-Dist 
to  X+Dist.  The  following  is  an  invariant  about  thi.s  function: 

Distl<Dist2  ^ 

spatial :  verticalj5lice(File.X,Dist2)  3 
spatial ;  vertical-slice(File,X.Distl). 

which  states  that  whenever  the  A'-coordinate  is  fixed,  the  points  in  a  vertical  slice  are 
contained  in  any  of  the  bigger  vertical  slices.  We  can  easily  write  similar  invjiriants 
for  other  spatial  functions.  The  invariant  for  the  horizontal-slice  function  is  the 
same  as  vertical-slice.  As  for  the  range(X,Y,Dist)  function  which  returns  all 
the  points  that  are  at  distance  Dist  from  point  (X,Y)  (i.e.  all  points  (XI, Yl)  such 
that  (X  —  Xl)^-f  (Y  —  Yl)^  <  Dist^  we  can  write  the  following  invariants: 

Distl<  Dist2  ^ 

spatial :  range(File,X,Y,Dist2)  3 
spatial:range(File,X,Y,Distl). 

|X1-X2|  <  |Distl-Dist2l  =» 

spatial :  range(File,X2,Y,Dist2)  3 
spatial:range(File,Xl,Y,Distl). 
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Our  research  proposes  a  cost-based  optimization  techniciue  that  caches  statistics 
of  actual  calls  to  the  sources  and  consequently  estimates  tlie  cost  of  the  possible 
execution  plans  based  on  the  statistics  cache.  VVe  investigate  issues  pertaining  We 
investigate  issues  pertaining  to  the  design  of  the  statistics  cache  and  experimentally 
analyze  vailous  tradeoffs.  We  also  present  a.  query  result  caching  mechanism  that 
allows  us  to  effectively  use  results  of  prior  cpieries  when  the  source  is  not  readily 
available.  VVe  employ  the  novel  invariants  mechanism,  which  shows  how  semantic 
information  about  data  sources  may  be  used  to  discover  cached  query  results  of  in¬ 
terest  . 

6  Maintaining  Mediated  Views/ Warehouses 

.4  mediated  materialized  view  (often  called  a  data  warehouse)  is  a  view  of  a  body 
of  distributed  heterogeneous  data  that  is  precomputed  and  stored  as  a  cache  of  the 
sort  described  in  the  preceding  section.  As  in  the  case  of  traditional  views,  mediated 
views  are  materialized  for  efficiency  reasons.  .4  materialized  view  can  be  affected  by 
two  kinds  of  updates,  namely  updates  to  the  materialized  view,  and  updates  to  the 
underlying  sources. 

If  an  update  of  the  first  kind  occurs  to  a  view,  whether  materialized  or  not,  the 
problem  of  reflecting  the  update  correctly  by  changing  the  base  tables  appropriately 
needs  to  be  addressed.  This  problem  is  called  the  view  update  problem  and  has 
been  discussed  extensively  for  relational,  deductive,  and  object-oriented  databases. 
However,  our  objective  is  slightly  different.  We  do  not  necessarily  assume  that  an 
update  occurring  to  a  view  has  to  be  reflected  within  some  underlying  source.  Instead, 
we  assume  that  the  view  itself  —  or.  to  be  more  precise,  its  definition  —  is  affected 
by  the  update.  This  kind  of  update  affecting  the  view's  definition  is  typically  not 
treated  within  the  view  update  literature.  One  exception  are  deductive  databases, 
where  the  addition  or  deletion  of  rules  to  the  definition  of  an  intensional  predicate  is 
discussed  b}'  Teniente.  However,  they  neither  materialize  nor  preprocess  the  view  for 
efficiency  reasons. 

VVhthin  the  traditional  context,  the  second  case  occurs  if  an  update  to  a  base  table 
occurs  which  possibly  affects  a  materialized  view.  The  resulting  problem  —  preserving 
the  consistenc}’  of  the  view  —  is  called  view  maintenance.  However,  since  we  do  not 
necessarily  materialize  the  view  upon  the  underlying  sources  of  our  mediated  views 
but  instead  perform  materialization  by  unfolding  the  view  definition  as  independent 
as  possible  from  the  underlying  sources,  the  traditional  view  maintenance  problem 
occurs  cpiite  differently  to  us.  Hence,  the  traditional  view  maintenance  problem  and 
our  problem  do  not  intersect  but  complement  each  other. 

Subsequently,  we  treat  both  kinds  of  updates  to  materialized  mediated  views  and 
show  how  they  can  be  handled  efficiently.  More  specifically,  the  primary  aim  is  to 
specify  how  to  efficiently  maintain  views  of  mediated  systems  such  as  those  that  may 
be  constructed  in  HERMES  when  insertion  and  deletion  requests  of  both  of  the  above 
two  kinds  are  made.  .4s  in  the  standard  case,  a  materialized  view  in  mediated  systems 
may  be  thought  of  as  a  set  of  facts  that  can  be  concluded  from  the  mediator  rules. 
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However,  we  show  that,  more  generally,  a  materialized  mediated  view  may  be  regarded 
as  a  set  of  constraint  atoms  that  are  not  necessarily  ground.  Taking  materialized  views 
to  be  sets  of  constrained  atoms  leads  to  a  number  of  advantages: 

1.  First  of  all.  it  allows  us  to  perform  updates  to  constrained  databases  as  well 
as  mediated  systems.  To  our  knowledge,  there  are  currently  no  methods  to 
incrementally  maintain  views  in  constrained  databases. 

2.  \^'e  show  for  updates  of  the  second  kind  that  even  in  the  case  of  unconstrained 
databases,  such  as  those  considered  by  Gupta,  Mumick  and  Subrahmanian, 
(which  we  have  been  told  is  now  u.sed  by  AT&  T  for  billing  purposes)  this 
approach  leads  to  a  simpler  and  more  efficient  deletion  algorithm  than  the 
deletion  algorithm,  DRed  presented  in  earlier. 

In  other  words,  not  only  have  we  developed  efficient  algorithms  for  view  manage¬ 
ment,  these  algorithms  also  (in  some  cases)  improve  upon  existing  algorithms  for 
view  management  in  traditional  relational  databases. 

7  Ontology  Management 

Any  mediator,  in  integrating  heterogeneous  sources,  has  to  resolve  both  syntactic  and 
semantic  conflicts  between  (the  data  in)  in  the  disparate  sources.  While  considerable 
work  has  been  done  on  this  problem  in  the  context  of  multi-database  systems,  little 
algorithmic  support  has  been  developed  for  resolving  (especially  semantic)  conflicts, 
and  currently,  resolving  them  is  largely  a  responsibility  of  the  mediator  developer. 
We  develop  appropriate  concepts  and  algorithms  for  solving  the  following  problems. 

•  Resolving  the  conflicts  between  data  coming  from  heterogeneous  sources 

•  Allowing  users  to  personalize  queries  so  as  to  address  their  own  needs  (e.g.,  a 
user  from  India  might  want  prices  returned  in  Indian  Rupees). 

•  Answering  personalized  queries. 

•  Maintaining  a  mediator  against  changes  to  the  data  sources,  in  the  form  of 
restructuring.  Such  restructuring  may  be  motivated  by  the  requirements  of 
the  local  user  community  of  the  source.  Our  ideas  and  techniques  apply  to 
any  mediator  framework,  such  as  the  TSIMMIS  project  at  Stanford  University, 
the  HERMES  project  at  University  of  Maryland,  the  SchemaLog  project  at 
Concordia  University,  the  Disco  project  at  Inria.  and  several  others. 

8  Security  in  Mediated  Systems 

Over  the  la.st  few  years,  there  has  been  considerable  work  on  security  in  databases, 
ost  of  this  work  has  been  limited  to  the  realm  of  relational  databases  though  of  late, 
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some  work  has  been  clone  on  extending  these  security  paradigms  to  object-oriented 
databases  deductive  databases,  and  other  paradigms.  Castano  et.  al.  provide  a 
comprehensive  description  of  related  work.  Despite  the  differences  in  the  underlying 
data  paradigm,  all  these  frameworks  share  a  single  trait  that  we  (cynically?)  term 
the  principle  of  paranoia. 

The  Principle  of  Paranoia.  The  DBMS  must  take  all  steps  necessary 
in  order  to  insure  that  the  user  u  cannot  to  infer  any  item  in  a  pre-desig.  .^ted 
set  S(u)  of  itenrs  that  are  to  be  kept  secret  from  the  user. 


However,  with  the  evolution  of  the  information  superhighway,  there  is  now  an  im¬ 
mense  amount  of  information  available  in  a  very  wide  variety  of  databases.  Wieder- 
hold  has  proposed  the  concept  of  a  mediator  -  intuitively,  a  mediator  is  a  program 
that  integrates  multiple  databases.  Consider  a  mediator  program  M  that  integrates 

some  software  packages  Pi _ _  P*,.  Each  of  the  packages  Pi,.  ..,Pk  may  enforce  its 

own  unique  local  security  policy.  Some  may  I’epresent  completely  “open-source”  soft¬ 
ware/data,  while  others  may  place  certain  restrictions  on  the  use  of  certain  facilities 
and/or  data  residing  within  it.  In  contrast  to  the  principle  of  paranoia  commonly 
enforced  in  ordinary  databases,  mediated  systems  must  attempt  to  be  maximally  co¬ 
operative  to  the  user,  yet  at  the  same  time,  they  must  respect  the  security  constraints 
of  the  individual  databases/packages  participating  in  the  mediated  system.  Thus, 
for  instance,  two  packages  Pi  and  P2  may  both  be  able  to  satisfy  a  user’s  request  - 
however,  package  Pi  uses  secure  data,  while  package  P2  uses  open-source  data.  In  this 
case,  the  mediated  system  may  reasonably  use  package  P2  to  respond  to  the  user’s 
query,  even  though  package  Pi  feels  this  data  should  be  kept  hidden  from  the  user. 
Notice  that  in  this  case,  the  user  could  directly  query  P2  and  get  the  data  without 
using  the  mediator  at  all,  so  the  mediator  might  as  well  do  it  for  him,  unless  a  global 
security  condition  maintained  by  the  mediator  pi'events  this.  Thus,  in  the  case  of 
mediated  systems,  we  may  wish  to  implement  a  policy  of  cautious  cooperation. 

The  Principle  of  Cautious  Cooperation.  If  a  user’s  query  can  be  an¬ 
swered  using  open-source  information,  then  the  mediator  will  answer  the  query 
unless  doing  so  will  directly  violate  the  system's  global  security  constraints. 
However,  the  system  will  always  respect  the  rights  of  individual  packages  par¬ 
ticipating  in  the  mediated  system,  and  ensure  that  no  single  package  violates 
its  own  local  security  policy. 

The  principle  of  cautious  cooperation  ensures  that  a  given  query  will  cause  no  di¬ 
rect  violation  of  global  integrity  constraints,  but  may  leave  the  path  open  for  future 
violations  or  for  inferential  violations.  ,4  slightly  more  conservative  policy,  that  we 
term  the  principle  of  conservative  caxdious  cooperation,  will  answer  a  query  posed  by 
the  user  iff  the  answer  to  that  c[uery  will  not  yield  an  “inference  path”  (sequence  of 
open-source  cjueries  coupled  with  logical  reasoning)  that  the  user  may  use  to  violate 
security  (unless  such  a  path  existed  prior  to  the  cpiery  being  issued  by  the  user). 
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The  Principle  of  Conservative  Cautious  Cooperation.  If  a 
user's  query  ran  be  answered  using  opeii-sourre  inform ation.  then  the  medi¬ 
ator  will  answer  the  (piery  unless  doing  so  will  cause  there  to  be  a  .sequence  of 
queries  (that  only  reflect  open-source  accesses)  such  that  if  the  user  asks  this 
sequence  of  queries,  then  he  will  be  able  to  violate  the  system's  global  security 
constraints.  However,  the  system  will  always  respect  the  rights  of  individutil 
packages  participating  in  the  mediated  .system,  and  ensure  that  no  single  pack¬ 
age  violates  its  own  local  security  policy. 

In  our  research,  w'e  have  developed  techniques  by  which  mediators  may  be  efficiently 
and  scalably  extended  to  encode  the  principles  of  paranoia  and  the  principle  of  cau¬ 
tious  cooperation,  as  well  as  the  principle  of  con.servative  cautious  cooperation. 


9  Web  Access  to  Mediator  Technology 

We  have  developed  algorithms  which  will  take  eis  input,  a  HERMES  mediator  M,  and 
generate  as  output,  an  HTML-form  that  can  be  used  to  query  mediator  M  by  any 
user  who  ha,s  access  to  the  World  Wide  Web  through  standard  Web  browsers  such  as 
Netscape  and  Microsoft  Explorer.  In  particular.  a,s  a  consequence  of  this  Web-page 
generation  module,  users  with  Web  brow.sers  on: 

•  Unix  workstations  (e.g.  SUNs  and  DECs) 

•  PC  devices  (e.g.  IBM-PC  compatibles) 

•  Wireless/cellular  palmtop  devices  (e.g.  The  Philips  Velo.  US  Robotics  Pilot) 
can  now  access  WebHERMES  from  such  devices. 


10  Heterogeneous  Multimedia  Databases 

Though  numerous  multimedia  systems  exist  in  the  commercial  maxket  today,  rela¬ 
tively  little  work  has  been  done  on  developing  the  mathematical  foundations  of  multi- 
media  technology.  We  attempt  to  take  .some  initial  steps  towards  the  development  of  a 
theoretical  basis  for  multimedia  information  system.  To  do  so,  we  develop  the  notion 
of  a  structured  multimedia  database  system.  We  begin  by  defining  a  mathematical 
model  of  a  media-instance.  A  media-instance  may  be  thought  of  as  “glue”  residing  on 
top  of  a  specific  physical  media-representation  (such  as  video,  audio,  documents,  etc.) 
Using  this  “glue”,  it  is  possible  to  define  a  general  purpose  logical  query  language  to 
query  multimedia  data.  This  glue  consists  of  a  set  of  “states”  (e.g.  video  frames, 
audio  tracks,  etc.)  and  “features”,  together  with  relationships  between  states  and/or 
features.  A  structured  multimedia  database  .system  imposes  a  certain  mathematical 
structure  on  the  set  of  features/states.  Using  this  notion  of  a  structure,  we  cU'e  able 
to  define  indexing  structures  for  processing  queries,  methods  to  relax  queries  when 
answers  do  not  exist  to  those  queries,  as  well  as  sound,  complete  and  terminating 
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procedures  to  answer  such  queries  (and  their  relaxations,  when  appropriate).  We 
show  how  a  media-presentation  can  be  generated  by  processing  a  sequence  of  ciueries, 
and  furthermore  we  show  when  these  queides  are  extended  to  include  constraints, 
then  these  queries  can  not  only  generate  presentations,  but  also  generate  temporal 
synchronization  properties  and  spatial  layout  properties  for  such  presentations.  We 
describe  the  architecture  of  a  prototype  multimedia  database  system  based  on  these 
principles. 


11  An  Application  to  Terrain  Reasoning 

The  'Work  described  here  was  done  jointly  with  researchers  at  the  US  A  rmnj  Topographic 
and  Engineering  Center  in  Ft.  Belvoir,  VA. 

In  this  section,  we  will  describe  an  application  of  our  work  to  intelligent  terrain 
reasoning  that  involves  integrating  terrain  map  data,  relational  data,  and  planning 
packages  (developed  at  the  US  Army  Corps  of  Engineers).  The  purpose  of  such  an 
integrated  system  is  many-fold.  It  can  be  used  as  a  basis  for  vehicular  navigation 
in  disaster  relief  situations  (e.g.  floods,  earthquakes,  volcanic  disasters,  etc.),  as  well 
as  in  military  mission  planning  applications.  In  these  applications,  a  user,  who  may 
either  be  a  human  or  may  be  an  autonomous  vehicle,  may  be  interested  in  posing 
cpieries  of  the  following  types: 

•  (Unknown  Destination)  Given  a  location,  find  a  place  that  has  an  airfield 
as  well  as  certain  types  of  ammunition.  Presumably  these  resources  are  needed 
in  order  for  the  autonomous/manned  vehicle  to  satisfy  its  mission. 

•  (Route  Properties)  Furthermore,  no  point  in  the  route  may  be  less  than  4 
miles  from  an  enemy  outpost.  In  this  example,  in  addition  to  the  fact  that  the 
destination  is  unknown,  we  have  the  fact  that  the  query  asks  not  only  for  a 
route  to  this  unknown  destination  point,  but  it  asks  for  a  route  that  satisfies 
certain  desiderata,  i.e.  which  satisfies  certain  conditions  that  require  accessing 
external  databases  (e.g.  to  figure  out  where  enemy  outposts  lie). 

A  route  planner  (which  w'e  will  call  RP)  has  been  implemented  at  the  US  Army 
Topographic  and  Engineering  Center.  Given  two  points,  this  route  planner  will  find  an 
optimal,  least-cost  path  between  these  two  points  (if  one  exists).  Thus,  for  instance, 
the  query 

rp  :  route((35, 70),  (200.  98)) 

returns  the  set  of  least-cost  paths  from  the  origin  point,  (35.70)  to  the  destination 
point  (200.98)  that  are  found  by  the  Army's  route  planner. 

We  illustrate  how  this  example  may  be  solved  within  the  HERMES  framework, 
using  RP  as  a  domain.  For  this,  let  us  suppose  that  we  have  a  relational  (PARADOX) 
database  containing  a  relation  called  facilities  having  the  schema  (Name, Facility). 
Thus,  this  relation  may  contain  a  tuple  of  the  form  {awasa,  airport)  denoting  that 
the  place,  Awasa,  has  an  airport.  Other  tuples  in  the  relation  facilities  may  be 
similarly  interpreted.  Suppose  there  is  another  (DBASE)  database  containing  a  relation 
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called  supplies  haA’ing  the  schema  (Place.ltem)  -  an  example  tuple  in  this  relation 
is  (atuasa.gas)  specifying  that  gas  is  available  at  Awasa^. 

Example  Query: 


rte2(0.D,R)  *—  rtel(O.D,R)<kgood(R). 
good(nil)  <— 

good(cons(H. T))  <—  goodpoint(H) &:good{T). 
goodpoint(H)  ^  is(  {},  spatial :  range(H.4)). 

Note  the  use  of  the  special  HERMES  predicate  rtel  has  been  described  earlier. 

12  Software  Developed 

Appendix  A  contains  a  complete  user  manual  of  the  HERMES  software. 


13  Educational  Accomplishments 

During  the  pursuit  of  this  research,  we  have  accomplished  the  following  milestones: 

•  V.S.  Subrahmanian  (PI)  received  the  NSF  National  Young  Investigator  Award. 

•  V.S.  Subrahmanian  (PI)  received  the  Maryland  Distinguished  Young  Scientist 
Award  (Maryland  Academy  of  Sciences). 

•  2  PhD's  were  granted: 

—  Sibel  Adah  received  her  PhD  in  1996  and  is  currently  an  Assistant  Professor 
of  Computer  Science  at  Rensselaer  Polytechnic  Institute  in  Troy,  NY. 

-  Kasim  S.  Candan  received  his  PhD  in  1997  and  is  currently  an  Assistant 
Professor  of  Computer  Science  at  Arizona  State  University  in  Tempe,  AZ. 

•  Kasim  S.  Candan  (student  supported  by  this  contract),  received  the  1997  ACM 
Samuel  Alexander  .Award  for  an  outstanding  dissertation. 

•  Sibel  Adah’s  work  supporting  this  contract  received  Press  Highlights  in  Business 
]^eek  Magazine,  .June  23.  1997. 

•  The  following  students  supported  in  part  by  this  contract  received  Mcister’s 
degrees:  Charlie  Ward,  Vadim  kagan. 

"In  practice,  these  relations  will  contain  much  more  detail,  but  we  keep  them  .simple  here  in  order 
to  facilitate  presentation. 
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space,  and  ground  systems  to  meet  customer  needs  in  the  areas  of  Global 
Awareness,  Dynamic  Planning  and  Execution,  and  Global  Information 
Exchange  is  the  focus  of  this  AFRL  organization.  The  directorate’s  areas 
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communication,  collaborative  environment  and  modeling  and  simulation, 
defensive  information  warfare,  and  intelligent  information  systems 


technologies. 


